5 research outputs found

    FPGA Acceleration of Domain-specific Kernels via High-Level Synthesis

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Efficient FPGA Implementation of PCA Algorithm for Large Data using High Level Synthesis

    Get PDF
    Principal Component Analysis (PCA) is a widely used method for dimensionality reduction in different application areas, including microwave imaging where the size of input data is large. Despite its popularity, one of the difficulties in using PCA is its high computational complexity, especially for large dimensional data. In recent years several FPGA implementations have been proposed to accelerate PCA computation. However, most of them use manual RTL design, which requires more time for design and development. In this paper, we propose an FPGA implementation of PCA using High Level Synthesis (HLS), which allows us to explore the design space more efficiently than with hand-coded RTL design. Starting from a PCA algorithm written in C++, we apply various hardware optimization techniques to the same code using Vivado HLS in order to quickly explore the design space. Our experiments show that the performance of the design obtained with the proposed method is superior to the state-of-the-art RTL design in terms of resource utilization, latency and frequency

    Development of an EM Device for Cerebrovascular Diseases Imaging and Hardware Acceleration for Imaging Algorithms within the EMERALD Network

    Get PDF
    This paper is presenting the first months of research activities within the Marie Skłodowska-Curie Innovative Training Network “EMERALD” developed by the Politecnico di Torino group. Our research work is related to the development of an electromagnetic device for cerebrovascular diseases imaging and to the hardware acceleration of the implemented imaging algorithms via field-programmable gate arrays or application-specific integrated circuits coupled with regular multicore central processing units and even graphics processing unit

    Multi-objective Framework for Training and Hardware Co-optimization in FPGAs

    No full text
    Although several works have recently addressed the problem of performance co-optimization for hardware and network training for Convolutional Neural Networks, most of them considered either a fixed network or a given hardware architecture. In this work, we propose a new framework for joint optimization of network architecture and hardware configurations based on Bayesian Optimization (BO) on top of High Level Synthesis. The multi-objective nature of this framework allows for the definition of various hardware and network performance goals as well as multiple constraints, and the multi-objective BO allows to easily obtain a set of Pareto points. We evaluate our methodology on a network optimized for an FPGA target and show that the Pareto set obtained by the proposed joint-optimization outperforms other methods based on a separate optimization or random search

    HLS-Based Flexible Hardware Accelerator for PCA Algorithm on a Low-Cost ZYNQ SoC

    No full text
    Principal Component Analysis (PCA) is a widely used approach for dimensionality reduction in image processing. In microwave imaging, for example, it is used as an intermediate step toward image reconstruction. An FPGA hardware implementation of PCA is highly beneficial, especially as an accelerator for a low-cost embedded environment. In this paper we propose a flexible PCA hardware accelerator that can be used for different input data dimensions and input precisions. In addition, it supports both floating-point and fixed-point arithmetic representations. The target hardware is a ZYNQ SoC. We used High Level Synthesis (HLS) to quickly explore the design space and so to find the best implementation for a given setting of the application parameters and given the characteristics of the target hardware. We show the impact on performance of different hardware optimization techniques enabled by HLS. The proposed method outperforms a similar state-of-the-art HLS design in terms of latency and resource usage
    corecore